Text localization using standard deviation analysis of structure elements and support vector machines

نویسندگان

  • Konstantinos Zagoris
  • Savvas A. Chatzichristofis
  • Nikos Papamarkos
چکیده

A text localization technique is required to successfully exploit document images such as technical articles and letters. The proposed method detects and extracts text areas from document images. Initially a connected components analysis technique detects blocks of foreground objects. Then, a descriptor that consists of a set of suitable document structure elements is extracted from the blocks. This is achieved by incorporating an algorithm called Standard Deviation Analysis of Structure Elements (SDASE) which maximizes the separability between the blocks. Another feature of the SDASE is that its length adapts according to the requirements of the application. Finally, the descriptor of each block is used as input to a trained support vector machines that classify the block as text or not. The proposed technique is also capable of adjusting to the text structure of the documents. Experimental results on benchmarking databases demonstrate the effectiveness of the proposed method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM

Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...

متن کامل

STAGE-DISCHARGE MODELING USING SUPPORT VECTOR MACHINES

Establishment of rating curves are often required by the hydrologists for flow estimates in the streams, rivers etc. Measurement of discharge in a river is a time-consuming, expensive, and difficult process and the conventional approach of regression analysis of stage-discharge relation does not provide encouraging results especially during the floods. P

متن کامل

Face Recognition using Eigenfaces , PCA and Supprot Vector Machines

This paper is based on a combination of the principal component analysis (PCA), eigenface and support vector machines. Using N-fold method and with respect to the value of N, any person’s face images are divided into two sections. As a result, vectors of training features and test features are obtain ed. Classification precision and accuracy was examined with three different types of kernel and...

متن کامل

A Comparative Study of Extreme Learning Machines and Support Vector Machines in Prediction of Sediment Transport in Open Channels

The limiting velocity in open channels to prevent long-term sedimentation is predicted in this paper using a powerful soft computing technique known as Extreme Learning Machines (ELM). The ELM is a single Layer Feed-forward Neural Network (SLFNN) with a high level of training speed. The dimensionless parameter of limiting velocity which is known as the densimetric Froude number (Fr) is predicte...

متن کامل

کاربرد الگوریتم‌های داده‌کاوی در تفکیک منابع رسوبی حوزۀ آبخیز نوده گناباد

Introduction: Reduction of sediment supply requires the implementation of soil conservation and sediment control programs in the form of watershed management plans. Sediment control programs require identifying the relative importance of sediment sources, their quantitative ascription and identification of critical areas within the watersheds. The sediment source ascription is involves two...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • EURASIP J. Adv. Sig. Proc.

دوره 2011  شماره 

صفحات  -

تاریخ انتشار 2011